Video games have emerged as one of society’s most beloved sources for entertainment over time. As a unique medium with interactive storytelling elements as well as skill tests for players that come across multiple forms and features. Gaining insight from computer-generated designs is crucial both for innovations during development phases and to adjust games in line with public demand. By examining different aspects including genre choice, factors leading up to creation or even how gamers respond after launch date - can help developers make key adjustments, direct academic research, social commentary, or guide players on choosing worthwhile purchases.
Our study focuses on analyzing a video games dataset focused on the Most Popular Games between the years 1980 - 2023 and the analysis focuses on Genre Analysis, Developer Analysis, and Player Engagement Analysis. By doing so, we hope to reveal essential information about what drives a video games’ longevity and success. Our primary objective is to equip gaming professionals and developers with insight into the underlying principles that discern prosperous titles from fleeting ones.
We believe that analyzing a broad spectrum of popular video games across time provides us with valuable knowledge about crucial facets such as genre selection criteria, developer contributions towards these titles’ success, as well as factors affecting player engagement. The intelligence obtained from these assessments should help developers make informed decisions regarding both production as well as promotions related strategies based on player preferences insights gained by analyzing such data sets. Consequently, helping professionals understand trends reflected within gaming markets, enabling them to stay ahead decisively. In summary, our goal for this analysis is focused on resolving complex questions revolving video game popularity and successful game development.
dataset <- read.csv("D:/Faks/Year 3/Data Programming/Project/games.csv")[-1253, ]
# Here I remove row 1253 because it is an entry for a game that hasn't been released yet and it also has the highest review even though it hasn't been released
datatable(dataset, rownames = T, filter = "top", caption = "Games Data Set", options = list(searching = F, pageLength = 10, lengthMenu = c(5, 10, 15, 20), scrollX = T, autoWidth = T, columnDefs = list(
list(targets = c(9, 10), visible = FALSE)
)))
In this data table I have just removed the columns Reviews and Summary because they are big chunks of text that extend the size of the table
colnames(dataset)
## [1] "X" "Title" "Release.Date"
## [4] "Team" "Rating" "Times.Listed"
## [7] "Number.of.Reviews" "Genres" "Summary"
## [10] "Reviews" "Plays" "Playing"
## [13] "Backlogs" "Wishlist"
With the function str(dataset) we display the internal structure of the data set
str(dataset)
## 'data.frame': 1511 obs. of 14 variables:
## $ X : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Title : chr "Elden Ring" "Hades" "The Legend of Zelda: Breath of the Wild" "Undertale" ...
## $ Release.Date : chr "Feb 25, 2022" "Dec 10, 2019" "Mar 03, 2017" "Sep 15, 2015" ...
## $ Team : chr "['Bandai Namco Entertainment', 'FromSoftware']" "['Supergiant Games']" "['Nintendo', 'Nintendo EPD Production Group No. 3']" "['tobyfox', '8-4']" ...
## $ Rating : num 4.5 4.3 4.4 4.2 4.4 4.3 4.2 4.3 3 4.3 ...
## $ Times.Listed : chr "3.9K" "2.9K" "4.3K" "3.5K" ...
## $ Number.of.Reviews: chr "3.9K" "2.9K" "4.3K" "3.5K" ...
## $ Genres : chr "['Adventure', 'RPG']" "['Adventure', 'Brawler', 'Indie', 'RPG']" "['Adventure', 'RPG']" "['Adventure', 'Indie', 'RPG', 'Turn Based Strategy']" ...
## $ Summary : chr "Elden Ring is a fantasy, action and open world game with RPG elements such as stats, weapons and spells. Rise, "| __truncated__ "A rogue-lite hack and slash dungeon crawler in which Zagreus, son of Hades the Greek god of the dead, attempts "| __truncated__ "The Legend of Zelda: Breath of the Wild is the first 3D open-world game in the Zelda series. Link can travel an"| __truncated__ "A small child falls into the Underground, where monsters have long been banished by humans and are hunting ever"| __truncated__ ...
## $ Reviews : chr "[\"The first playthrough of elden ring is one of the best eperiences gaming can offer you but after youve explo"| __truncated__ "['convinced this is a roguelike for people who do not like the genre. The art is technically good but the aesth"| __truncated__ "['This game is the game (that is not CS:GO) that I have played the most ever. I have played this game for 400 h"| __truncated__ "['soundtrack is tied for #1 with nier automata. a super charming story and characters which have become iconic"| __truncated__ ...
## $ Plays : chr "17K" "21K" "30K" "28K" ...
## $ Playing : chr "3.8K" "3.2K" "2.5K" "679" ...
## $ Backlogs : chr "4.6K" "6.3K" "5K" "4.9K" ...
## $ Wishlist : chr "4.8K" "3.6K" "2.6K" "1.8K" ...
With the summary(dataset) function we do statistical analysis on our data.
summary(dataset)
## X Title Release.Date Team
## Min. : 0.0 Length:1511 Length:1511 Length:1511
## 1st Qu.: 377.5 Class :character Class :character Class :character
## Median : 755.0 Mode :character Mode :character Mode :character
## Mean : 755.2
## 3rd Qu.:1132.5
## Max. :1511.0
##
## Rating Times.Listed Number.of.Reviews Genres
## Min. :0.700 Length:1511 Length:1511 Length:1511
## 1st Qu.:3.400 Class :character Class :character Class :character
## Median :3.800 Mode :character Mode :character Mode :character
## Mean :3.719
## 3rd Qu.:4.100
## Max. :4.600
## NA's :13
## Summary Reviews Plays Playing
## Length:1511 Length:1511 Length:1511 Length:1511
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Backlogs Wishlist
## Length:1511 Length:1511
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
The first analysis we are going to conduct is Genre Analysis, in this analysis we will focus on which games are most popular amongst gamers in order to understand which game genres are more likely to succeed and attract a larger audience. By examining the average ratings of different genres, we can understand the preferences and tastes of gamers. This information will be valuable for Game developers and Publishers, as it can help guide their decision-making process when it comes to game development.
# Create a new dataset for genre analysis
genre_dataset <- dataset
# Converting Genres column to characters
genre_dataset <- genre_dataset %>%
mutate(Genres = as.character(Genres))
# Splitting the genre column into separate genres
genre_dataset <- genre_dataset %>%
mutate(Genres = str_extract_all(Genres, "'(.*?)'")) %>%
unnest(Genres)
# Calculating average rating per genre
genre_ratings <- genre_dataset %>%
group_by(Genres) %>%
summarize(AverageRating = mean(Rating, na.rm = TRUE))
# Creating a column chart with genre names on the x-axis and ratings on the y-axis
ggplot(data = genre_ratings, aes(x = reorder(Genres, AverageRating), y = AverageRating)) +
geom_col(fill = "steelblue", width = 0.7) +
labs(x = "Genre", y = "Average Rating", title = "Average Rating by Genre") +
scale_y_continuous(breaks = seq(0, ceiling(max(genre_ratings$AverageRating)), by = 0.5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5, size = 8))
As we can view from the graph the most popular Game Genres amongst gamers is RPG, Turn Based Strategy and Visual Novel, of which Visaul Novel is the only Genre with a rating above 4.0
This analysis is going to focus on the best Developer Companies, with this analysis we hope to uncover which video game developer companies are the most prominent in the video game industry. By calculating which video game companies are most likely to create a successful game and have their games be popular amongst gamers. This information will be valuable for Investors, as it can help guide their decision in which companies to invest.
# Creating a new dataset for team analysis
team_dataset <- dataset
# Converting Team column to character type
team_dataset <- team_dataset %>%
mutate(Team = as.character(Team))
# Splitting the Team column into separate teams
team_dataset <- team_dataset %>%
mutate(Team = str_extract_all(Team, "'(.*?)'")) %>%
unnest(Team)
# Counting the number of games developed per team
team_counts <- team_dataset %>%
group_by(Team) %>%
summarize(GamesDeveloped = n())
# Selecting the top 5 teams with the most games developed
top_teams <- team_counts %>%
top_n(5, wt = GamesDeveloped) # Select top 5 teams based on games developed
# Defining a bright color palette
colors <- brewer.pal(length(top_teams$GamesDeveloped), "Set1")
# Creating a pie chart to visualize the distribution of games developed among the top 5 teams
ggplot(data = top_teams, aes(x = "", y = GamesDeveloped, fill = Team)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
labs(fill = "Team", x = NULL, y = NULL, title = "Distribution of Games developed\namongs Game Development Companies") +
theme_void() +
theme(legend.position = "bottom") +
geom_text(aes(label = paste(GamesDeveloped)), position = position_stack(vjust = 0.5)) +
scale_fill_manual(values = colors)
As we can view from the graph the most prominent Game Development Companies are Capcom, Electronic Arts, Nintendo, Sega and Square Enix, of which Nintendo is by far the most successful company as it has produced 245 of the games on this list.
This analysis focuses on Player Engagement with highly rated video games. The purpose of our analysis is to compare how many players have played a given game against how many individuals own that same title but haven’t started playing yet. We will focus on the top 5 games based on their ratings. With this analysis, we hope to better understand player engagement and ownership patterns in the top-rated game. This analysis can benefit various stakeholders in the gaming industry, including game developers, publishers, and marketers.
# Creating a new dataset for analysis
analysis_dataset <- dataset
# Extracting necessary columns for the analysis
analysis_data <- analysis_dataset %>%
select(Title, Plays, Playing, Backlogs, Rating) %>%
mutate(
Plays = parse_number(Plays) * ifelse(grepl("K$", Plays), 1000, 1),
Playing = parse_number(Playing) * ifelse(grepl("K$", Playing), 1000, 1),
Backlogs = parse_number(Backlogs) * ifelse(grepl("K$", Backlogs), 1000, 1)
) %>%
arrange(desc(Rating)) %>%
head(5) # Select top 5 games based on rating
# Calculating the total number of players who have played the game
analysis_data <- analysis_data %>%
mutate(TotalPlayers = Plays + Playing)
# Calculating the total number of copies
analysis_data <- analysis_data %>%
mutate(TotalCopies = TotalPlayers + Backlogs)
# Calculating the percentage of non-players
analysis_data <- analysis_data %>%
mutate(NonPlayersPercentage = ceiling((Backlogs / TotalCopies) * 100))
# Creating a column chart to compare the number of players who have played the game and those who own it but haven't started
column_chart <- ggplot(data = analysis_data) +
geom_col(aes(x = as.numeric(factor(Title)), y = TotalPlayers, fill = "Total Players"), width = 0.4, position = position_dodge(width = 0.8)) +
geom_col(aes(x = as.numeric(factor(Title)) + 0.4, y = Backlogs, fill = "Non-Players"), width = 0.4, position = position_dodge(width = 0.8)) +
geom_text(aes(x = as.numeric(factor(Title)) + 0.2, y = TotalPlayers, label = TotalPlayers), vjust = 1.2, hjust = 1.3) +
geom_text(aes(x = as.numeric(factor(Title)) + 0.6, y = Backlogs, label = Backlogs), vjust = 1.1, hjust = 1.3) +
scale_x_continuous(breaks = as.numeric(factor(analysis_data$Title)), labels = analysis_data$Title) +
labs(x = "Game", y = "Number of Players", title = "Comparison of Players Played vs Non-Players \n (Top 5 Games by Rating)") +
scale_fill_manual(values = c("Total Players" = "steelblue", "Non-Players" = "orange")) +
theme_bw() +
coord_flip() +
scale_y_continuous(labels = scales::comma) +
guides(fill = guide_legend(title = "Status")) +
theme(legend.position = "top", axis.text.x = element_text(angle = 45, hjust = 1))
print(column_chart)
# Defining a bright color palette
colors <- brewer.pal(length(analysis_data$NonPlayersPercentage), "Set1")
# Creating a pie chart to visualize the distribution of non-player percentages
pie_chart <- ggplot(data = analysis_data, aes(x = "", y = NonPlayersPercentage, fill = Title)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
labs(x = NULL, y = NULL, title = "Percentage of Non-Players") +
theme_void() +
theme(legend.position = "right") +
geom_text(aes(label = paste((NonPlayersPercentage), "%")), position = position_stack(vjust = 0.5)) +
scale_fill_manual(values = colors)
# Displaying the pie chart of non-player percentages
print(pie_chart)
From these graphs we can conclude that the player/non-players distribution is Quite Different in these games, even though the ratings for these games are quite similar. For instance the game “Outer Wilds” has 8361 players, which the highest out of these 5 games, it also has 4800 non-players, which means that 36% of all people who own “Outer Wilds” don’t play it. Similar results are show for “Disco Elysium: The Final Cut”, which has 40% non-play rate. On the other hand less popular games like “Bloodborn: The Old Hunters” has a shocking 17% non-play rate, which means most of the people who bought the game did infact play it and rated it highly.
In our exploratory data analysis, we gained valuable information about the gaming industry. We identified popular game genres such as RPG, Turn Based Strategy, and Visual Novel. Successful game developers such as Capcom, Electronic Arts, Nintendo, Sega, and Square Enix. Additionally, we analyzed Player engagement and Ownership patterns, revealing variations in non-play rates among highly rated games.
These findings offer valuable information for game developers, publishers, investors, and marketers. They can use these findings to make informed decisions about game development, investment opportunities, and marketing strategies. Overall, this analysis contributes to a better understanding of the gaming industry and its dynamics.
This study was influenced by an article that investigates engagement strategies in popular video games (Dickey 2005)